TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
نویسندگان
چکیده
منابع مشابه
Mean and variance responsive learning
Decision makers are often described as seeking higher expected payo¤s and avoiding higher variance in payo¤s. We provide some necessary and some su¢ cient conditions for learning rules, that assume the agent has little prior and feedback information about the environment, to reect such preferences. We adopt the framework of Börgers, Morales and Sarin (2004, Econometrica) who provide similar re...
متن کاملAsymptotic algorithm for computing the sample variance of interval data
The problem of the sample variance computation for epistemic inter-val-valued data is, in general, NP-hard. Therefore, known efficient algorithms for computing variance require strong restrictions on admissible intervals like the no-subset property or heavy limitations on the number of possible intersections between intervals. A new asymptotic algorithm for computing the upper bound of the samp...
متن کاملVariance Reduction Methods for Sublinear Reinforcement Learning
This work considers the problem of provably optimal reinforcement learning for (episodic) finite horizon MDPs, i.e. how an agent learns to maximize his/her (long term) reward in an uncertain environment. The main contribution is in providing a novel algorithm — Variance-reduced Upper Confidence Q-learning (vUCQ) — which enjoys a regret bound of Õ( √ HSAT +HSA), where the T is the number of time...
متن کاملThe kNN-TD Reinforcement Learning Algorithm
A reinforcement learning algorithm called kNN-TD is introduced. This algorithm has been developed using the classical formulation of temporal difference methods and a k-nearest neighbors scheme as its expectations memory. By means of this kind of memory the algorithm is able to generalize properly over continuous state spaces and also take benefits from collective action selection and learning ...
متن کاملThe Tail Mean-Variance Model and Extended Efficient Frontier
In portfolio theory, it is well-known that the distributions of stock returns often have non-Gaussian characteristics. Therefore, we need non-symmetric distributions for modeling and accurate analysis of actuarial data. For this purpose and optimal portfolio selection, we use the Tail Mean-Variance (TMV) model, which focuses on the rare risks but high losses and usually happens in the tail of r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence
سال: 2001
ISSN: 1346-0714,1346-8030
DOI: 10.1527/tjsai.16.353